Stock Price Prediction is a machine learning project that aims to predict the future value of a stock based on its past performance and other market conditions. The goal is to analyze financial data, including historical stock prices, news articles, economic indicators, and other relevant information to build a model that can accurately predict the direction of stock prices. The resulting predictions can be used by investors to make informed decisions about buying or selling stocks.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import chart_studio.plotly as py
import plotly.graph_objs as go
from plotly.offline import plot
from matplotlib.pylab import rcParams
rcParams['figure.figsize']=20,10
from keras.models import Sequential
from keras.layers import LSTM,Dropout,Dense
SP=pd.read_csv("SP data.csv")
SP
| Date | Open | High | Low | Close | Adj Close | Volume | |
|---|---|---|---|---|---|---|---|
| 0 | 2018-02-05 | 262.000000 | 267.899994 | 250.029999 | 254.259995 | 254.259995 | 11896100 |
| 1 | 2018-02-06 | 247.699997 | 266.700012 | 245.000000 | 265.720001 | 265.720001 | 12595800 |
| 2 | 2018-02-07 | 266.579987 | 272.450012 | 264.329987 | 264.559998 | 264.559998 | 8981500 |
| 3 | 2018-02-08 | 267.079987 | 267.619995 | 250.000000 | 250.100006 | 250.100006 | 9306700 |
| 4 | 2018-02-09 | 253.850006 | 255.800003 | 236.110001 | 249.470001 | 249.470001 | 16906900 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 1004 | 2022-01-31 | 401.970001 | 427.700012 | 398.200012 | 427.140015 | 427.140015 | 20047500 |
| 1005 | 2022-02-01 | 432.959991 | 458.480011 | 425.540009 | 457.130005 | 457.130005 | 22542300 |
| 1006 | 2022-02-02 | 448.250000 | 451.980011 | 426.480011 | 429.480011 | 429.480011 | 14346000 |
| 1007 | 2022-02-03 | 421.440002 | 429.260010 | 404.279999 | 405.600006 | 405.600006 | 9905200 |
| 1008 | 2022-02-04 | 407.309998 | 412.769989 | 396.640015 | 410.170013 | 410.170013 | 7782400 |
1009 rows × 7 columns
SP.head(10)
| Date | Open | High | Low | Close | Adj Close | Volume | |
|---|---|---|---|---|---|---|---|
| 0 | 2018-02-05 | 262.000000 | 267.899994 | 250.029999 | 254.259995 | 254.259995 | 11896100 |
| 1 | 2018-02-06 | 247.699997 | 266.700012 | 245.000000 | 265.720001 | 265.720001 | 12595800 |
| 2 | 2018-02-07 | 266.579987 | 272.450012 | 264.329987 | 264.559998 | 264.559998 | 8981500 |
| 3 | 2018-02-08 | 267.079987 | 267.619995 | 250.000000 | 250.100006 | 250.100006 | 9306700 |
| 4 | 2018-02-09 | 253.850006 | 255.800003 | 236.110001 | 249.470001 | 249.470001 | 16906900 |
| 5 | 2018-02-12 | 252.139999 | 259.149994 | 249.000000 | 257.950012 | 257.950012 | 8534900 |
| 6 | 2018-02-13 | 257.290009 | 261.410004 | 254.699997 | 258.269989 | 258.269989 | 6855200 |
| 7 | 2018-02-14 | 260.470001 | 269.880005 | 260.329987 | 266.000000 | 266.000000 | 10972000 |
| 8 | 2018-02-15 | 270.029999 | 280.500000 | 267.630005 | 280.269989 | 280.269989 | 10759700 |
| 9 | 2018-02-16 | 278.730011 | 281.959991 | 275.690002 | 278.519989 | 278.519989 | 8312400 |
print(SP[SP.isnull().any(axis=1)])
Empty DataFrame Columns: [Date, Open, High, Low, Close, Adj Close, Volume] Index: []
import pandas as pd
# Assuming the dataset is stored in a pandas DataFrame called 'df'
desired_value = 275.690002
# Filter the DataFrame based on the desired value in the 'Low' column
filtered_data = SP[SP['Low'] == desired_value]
# Print the filtered data
print(filtered_data)
# second type of formate to find the specific value and row
value = SP[SP['Volume'] == 10972000]
print(value)
Date Open High Low Close Adj Close \
9 2018-02-16 278.730011 281.959991 275.690002 278.519989 278.519989
Volume
9 8312400
Date Open High Low Close Adj Close Volume
7 2018-02-14 260.470001 269.880005 260.329987 266.0 266.0 10972000
SP.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1009 entries, 0 to 1008 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 1009 non-null object 1 Open 1009 non-null float64 2 High 1009 non-null float64 3 Low 1009 non-null float64 4 Close 1009 non-null float64 5 Adj Close 1009 non-null float64 6 Volume 1009 non-null int64 dtypes: float64(5), int64(1), object(1) memory usage: 55.3+ KB
SP['Date'] = pd.to_datetime(SP['Date'])
print(f'Dataframe contains SP between {SP.Date.min()} {SP.Date.max()}')
Dataframe contains SP between 2018-02-05 00:00:00 2022-02-04 00:00:00
print(f'Total dates = {(SP.Date.max() - SP.Date.min()).days}')
Total dates = 1460
SP.describe()
| Open | High | Low | Close | Adj Close | Volume | |
|---|---|---|---|---|---|---|
| count | 1009.000000 | 1009.000000 | 1009.000000 | 1009.000000 | 1009.000000 | 1.009000e+03 |
| mean | 419.059673 | 425.320703 | 412.374044 | 419.000733 | 419.000733 | 7.570685e+06 |
| std | 108.537532 | 109.262960 | 107.555867 | 108.289999 | 108.289999 | 5.465535e+06 |
| min | 233.919998 | 250.649994 | 231.229996 | 233.880005 | 233.880005 | 1.144000e+06 |
| 25% | 331.489990 | 336.299988 | 326.000000 | 331.619995 | 331.619995 | 4.091900e+06 |
| 50% | 377.769989 | 383.010010 | 370.880005 | 378.670013 | 378.670013 | 5.934500e+06 |
| 75% | 509.130005 | 515.630005 | 502.529999 | 509.079987 | 509.079987 | 9.322400e+06 |
| max | 692.349976 | 700.989990 | 686.090027 | 691.690002 | 691.690002 | 5.890430e+07 |
SP[['Open','High','Low','Close','Adj Close']].plot(kind='box')
<Axes: >
layout = go.Layout(
title = 'Stock price plot')
SP_data = [{'x':SP['Date'], 'y':SP['Close']}]
plot = go.Figure(data=SP_data)
plot
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error as mse
from sklearn.metrics import r2_score
X = np.array(SP.index).reshape(-1,1)
Y = SP['Close']
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=101)
scaler = StandardScaler().fit(X_train)
scaler
StandardScaler()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
StandardScaler()
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(X_train, Y_train)
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LinearRegression()
trace0 = go.Scatter(
x = X_train.T[0],
y = Y_train,
mode = 'markers',
name = 'Actual'
)
trace1 = go.Scatter(
x = X_train.T[0],
y = lm.predict(X_train).T,
mode = 'lines',
name = 'Predicted'
)
SP_data = [trace0,trace1]
layout.xaxis.title.text = 'Day'
plot2 = go.Figure(data=SP_data, layout=layout)
plot2
This code defines a string scores that displays metrics for a linear regression model (lm) on both the training set (X_train and Y_train) and the test set (X_test and Y_test). The metrics displayed are the R^2 score and the mean squared error (MSE). The string uses f-string syntax to embed the results of the r2_score and mse functions applied to the train and test sets. The ljust and center methods are used to align the text in the string. When the scores string is printed, it will display a table showing the metric names, their values on the training set, and their values on the test set.
scores = f'''
{'Metric'.ljust(10)}{'Train'.center(20)}{'Test'.center(20)}
{'r2_score'.ljust(10)}{r2_score(Y_train, lm.predict(X_train))}\t{r2_score(Y_test, lm.predict(X_test))}
{'MSE'.ljust(10)}{mse(Y_train, lm.predict(X_train))}\t{mse(Y_test, lm.predict(X_test))}
'''
print(scores)
Metric Train Test r2_score 0.6992669032944175 0.7261648669848495 MSE 3403.003880002517 3460.9885809580633